Cell Arrangement Workflow (CAW)

This workflow was developed to provide more complex data analysis to the table outputted by the GE Research Use Only system. The workflow walks the user through a number of data processing stages through the use of Jupyter Notebook that is included in the Anaconda 3 Python Environment. These data processing stages include:

1) Initiate Python Packages – The user will initiate and load relevant Python packages into the Jupyter Notebook.

2) Input Session Information – The user will identify them along with relevant information pertaining to the data analysis session.

3) Assign and Load the Files – The user will adjust the inputs of the code to the appropriate files and folders that are to be used.

4) Select the Biomarkers, the Slide and Regions, and if to include Cell if found in Stroma/Epithelial mask – The user will select, from the available biomarkers mentioned in the GE file, which ones will be included in the analysis. They will also select the slides that will be included along with the specific regions in each cell. The user will also have an opportunity here to specify if they want to include only cells found in the Stroma/Epithelial mask.

5) Cell Quality Control Parameters – The user will identify acceptable the criteria that a cell needs to pass to be included in the final analysis. This section includes criteria such as the slide and region for which the cell is located, the minimum GE quality score acceptable, and the identity of the cell as a stroma or epithelial cell, and various morphological features.

6) Cell Biomarker Expression Data Settings – The user will define what biomarker intensities data transformations need to be and define the appropriate data processing procedure; thresholding, K-means clustering, or continuous variables.

7) Data Processing – The user will initiate the data processing based on the settings selected in the earlier stages.

8) Data Visualization – The results of the data processing are dynamically displayed to the user.

In [2]:
from IPython.display import display, Markdown, Javascript
from ipywidgets import widgets
import WidgetFunctionsDimaEdit as WidgFunc

display(Markdown('## Running Through Jupyter Notebook'))


display(Markdown('To run a cell, click on the cell such that it is highlighted:'))

file = open("ScreenCaptureIms\SelectCell.png", "rb")
selimg = file.read()
selimgwid = widgets.Image(value=selimg,
                            format='png')
display(selimgwid)
 
display(Markdown('To execute the cell either press and hold the __Ctrl + Enter__ or __Shift + Enter__ keys together, or press the _"run cell, select below"_ button:'))

file = open("ScreenCaptureIms\RunCell.png", "rb")
selimg = file.read()
runimgwid = widgets.Image(value=selimg,
                            format='png')
display(runimgwid)
 

display(Markdown('# 1) Initiate Python Packages'))
display(Markdown('Python Packages are the coding libraries that provide the computational tools used in this workflow. This might take some time to fully load. Please select and run the next cell.'))

display(Markdown('When all of the packages are properly imported, the following message will appear:'))

display(Markdown('<h2><center>All Python packages have been imported.</center></h2>'))

display(Markdown('It is also important to note that there may be some warning box that will precede the "_All Python packages have been imported._" message. Please ignore them. They are of no consequence to the running of the workflow.'))

display(Markdown('This cell will also setup the dictionary of user settings that will track all of the user defined parameters.'))


# Run next code Cell
display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index(), IPython.notebook.get_selected_index()+1)'))

Running Through Jupyter Notebook

To run a cell, click on the cell such that it is highlighted:

To execute the cell either press and hold the Ctrl + Enter or Shift + Enter keys together, or press the "run cell, select below" button:

1) Initiate Python Packages

Python Packages are the coding libraries that provide the computational tools used in this workflow. This might take some time to fully load. Please select and run the next cell.

When all of the packages are properly imported, the following message will appear:

All Python packages have been imported.

It is also important to note that there may be some warning box that will precede the "All Python packages have been imported." message. Please ignore them. They are of no consequence to the running of the workflow.

This cell will also setup the dictionary of user settings that will track all of the user defined parameters.

In [26]:
# Packages:
# Data Organization and Processing:
import pandas as pd
# remove the warning for indexing that is really annoying with pandas
pd.set_option('mode.chained_assignment', None)
import numpy as np
import re
from collections import OrderedDict

# Machine Learning
from sklearn.cluster import KMeans, DBSCAN
from sklearn import metrics

# Custom-Built Supporting Functions
import GEFileFunctions
import WidgetFunctionsDimaEdit_working as WidgFunc
import ConcatFunctions_VHE_20180913 as Conc
#import ConcatFunctions4

# Settings:
# Display up to 50 columns
pd.set_option('display.max_columns', 50)
import os
import datetime

# Display Modules
from IPython.display import Latex, clear_output, Javascript
from ipywidgets import interact, fixed, widgets, Layout, interactive
from IPython.display import HTML
import traitlets
from tkinter import Tk, filedialog

# Visualization:
import matplotlib.pyplot as plt
import matplotlib.colors as pltcol
import cv2

import plotly.offline as py
from plotly.offline import enable_mpl_offline, iplot_mpl
py.init_notebook_mode(connected = True)
enable_mpl_offline()

import plotly.graph_objs as go
from plotly.widgets import GraphWidget
import colorlover as cl

# Building Settings Dictionary
QualSettingsDict = OrderedDict({
    'SessionInfo':{
        'User':[],
        'Date':[],
        'ProjectName':[],
        'UserProjNotes':[],
    },
    'FileInfo':{
        'GEFileName': [],
        'DAPIimFileName': [],
        'OutDataFolder':[],
    },
    'ProcessingSettings':{'BMs':{},
        'slide_regs_original': [],
        'epithelial': [],
        
    },
    'CellAcceptRanges':{
        'Perimeter': [],
        'Eccentricity': [],
        'MajorAxisLength': [],
        'Nuc_Area': [],
        'Cyt_Area': [],
        'Memb_Area': [],
        'Cell_Area': [],
    }
})


# Display Python Package Importing Complete message
display(Markdown('## All Python packages have been imported.'))

WidgFunc.SimpleClickProceedSettings()


#--- Calling Toggle ----
# Hide or display code button definition
HideShowCodeButton = HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click to toggle on/off raw code"></form>''')

# Display Hide or display code button
HideShowCodeButton

All Python packages have been imported.

Out[26]:
In [5]:
display(Markdown('# 2) Input Session Information'))
display(Markdown('The following cell will produce a quick digital form to fill out such that the project information can be saved together.'))
display(Markdown('At the end of the form, a "Lock Settings" button will appear. Please click it to ensure that all of the inputs are properly saved. '))

# Run next code Cell
WidgFunc.SimpleClickProceedSettings()

2) Input Session Information

The following cell will produce a quick digital form to fill out such that the project information can be saved together.

At the end of the form, a "Lock Settings" button will appear. Please click it to ensure that all of the inputs are properly saved.

In [6]:
#declare global variables here to communicate with function library properly
global SessionDate
global User
global ProjectName
global ProjectNotes

SessionDate = widgets.DatePicker(
    value = datetime.date.today(),
    description = 'Date of analysis session:',
    style = {'description_width':'initial'},
)

User = widgets.Text(
    description = 'User:',
    style = {'description_width':'initial'}
)

############global ProjectName

ProjectName = widgets.Text(
    description = 'Project Name:',
    style = {'description_width':'initial'}
)

ProjectNotes = widgets.Textarea(
    description = 'Project Notes:',
    style = {'description_width':'initial'}
)

display(Markdown('#### Please fill in the following information:'))
display(User)
display(SessionDate)
display(ProjectName)
display(ProjectNotes)



#----- Lock Settings Toggle ------------
global NumberLocked1
global ProjNameErr_bool
NumberLocked1 = 0
ProjNameErr_bool = False


#----- Error Message Widget ------------
# this is a really ridiculous hack to be able to open and close error messages at will
# I couldn't figure out the display text widgets so I created an empty progress bar that never fills
# and I just open and close the text/progress bar title at will. Very hacky solution.... :(
def INITProjNameErr():
    global ProjNameErr
    ProjNameErr = widgets.FloatProgress(
                description='<font color=red size=3>Name Error: Please Input Project Name.</font>',
                bar_style='info',
                orientation='horizontal',
                style = {'description_width':'initial'}
                )


# Function to save User Inputs everytime Lock Settings is Clicked
def ExecuteOnLock():
    if (ProjectName.value != ''):
        QualSettingsDict['SessionInfo']['Date'] = SessionDate.value
        QualSettingsDict['SessionInfo']['ProjectName'] = ProjectName.value
        QualSettingsDict['SessionInfo']['User'] = User.value
        QualSettingsDict['SessionInfo']['UserProjNotes'] = ProjectNotes.value
        # Run next cell if not run yet - stop it from re-running cell below
        global NumberLocked1
        if NumberLocked1 ==0:
            # Run next code cell
            display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index()+1, IPython.notebook.get_selected_index()+2)'))
        NumberLocked1 = NumberLocked1 + 1
    if (ProjectName.value == ''):
        INITProjNameErr()
        display(ProjNameErr)
        global ProjNameErr_bool
        ProjNameErr_bool = True

# Close error message on unlock to clear panel
def ExecuteOnUnlock():
    if ProjNameErr_bool is True:
        ProjNameErr.close()
    return None


#--- Calling Toggle ----
WidgetList = [User, SessionDate, ProjectName, ProjectNotes]
WidgFunc.SimpleToggleLockSettings(WidgetList, ExecuteOnLock, ExecuteOnUnlock)

Please fill in the following information:

In [7]:
display(Markdown('# 3) Assign and Load the Files'))
display(Markdown('Please run the cell below and select the following:')) 
display(Markdown('* GE table to be processed'))
display(Markdown('* GE file\'s corresponding DAPI image (for visualization purposes)'))
display(Markdown('* a folder that you would like all the output material sent to'))

# Run next code Cell
WidgFunc.SimpleClickProceedSettings()

3) Assign and Load the Files

Please run the cell below and select the following:

  • GE table to be processed
  • GE file's corresponding DAPI image (for visualization purposes)
  • a folder that you would like all the output material sent to
In [8]:
# Get the GE file from user
GEfile = WidgFunc.SelectFilesButton()
# Get the DAPI image from user
DAPIfile = WidgFunc.SelectOneFileButton()
# Get the output folder from user
OutputFold = WidgFunc.SelectFolderButton()
# Get the output file format from user
OutputFileFormat = widgets.RadioButtons(options=['.csv', '.xlsx'],
                                        value='.csv',
                                        description='Output file format:',
                                        style = {'description_width':'initial'},
                                        disabled=False
                                        )

#-------- Displaying widgets and instructions --------
display(widgets.Label('Please select the GE file(s) that contain the table(s) to be analysed:'))
display(GEfile)
display(widgets.Label('Please select the desired output file format (note: .xlsx will take longer to run than .csv):'))
display(OutputFileFormat)
display(widgets.Label('Please select the DAPI image file for visualization purposes:'))
display(DAPIfile)
display(widgets.Label('Please select the output folder, where all results will be saved:'))
display(OutputFold)

#------- Lock Settings toggle --------
global NumberLocked2a
global INITFileUploadErr_bool
global INITFoldUploadErr_bool
INITFileUploadErr_bool = False
NumberLocked2a = 0
INITFoldUploadErr_bool = False

def INITFileUploadErr():
    global FileUploadErr
    FileUploadErr = widgets.FloatProgress(
                description='<font color=red size=3>File Error: Please upload all files.</font>',
                bar_style='info',
                orientation='horizontal',
                style = {'description_width':'initial'})

def INITFoldUploadErr():
    global FoldUploadErr
    FoldUploadErr = widgets.FloatProgress(
                description='<font color=red size=3>Error: Please select an output folder.</font>',
                bar_style='info',
                orientation='horizontal',
                style = {'description_width':'initial'})    

def ExecuteOnLock():
    try:
        OutputFold.fold != []
    except AttributeError:
        INITFoldUploadErr()
        display(FoldUploadErr)
        global FoldUploadErr_bool
        FoldUploadErr_bool = True
        return

    if (GEfile.files != []) & (DAPIfile.files != []):
        # Run next cell if not run yet - stop it from re-running cell below
        global NumberLocked2a
        if NumberLocked2a == 0:
            # Run next code cell
            display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index()+1, IPython.notebook.get_selected_index()+2)'))
        NumberLocked2a = NumberLocked2a + 1

    if (DAPIfile.files == []) or (GEfile.files == []) or (OutputFold.fold == []):
        INITFileUploadErr()
        display(FileUploadErr)
        global FileUploadErr_bool
        FileUploadErr_bool = True

def ExecuteOnUnlock():
    try:
        if FoldUploadErr_bool is True:
            FoldUploadErr.close()
        if FileUploadErr_bool is True:
            FileUploadErr.close()
    except NameError:
        return
    return None

# Saving settings in dictionary:
WidgetList = [GEfile, OutputFileFormat, DAPIfile, OutputFold]
WidgFunc.SimpleToggleLockSettings(WidgetList, ExecuteOnLock, ExecuteOnUnlock)
In [9]:
#------- Defined ExportOnClick() Button --------
# implement this in the widgets function library
class ExportOnClick(widgets.Button):
    """A file widget that concatenates the input files into one output with some file name
    that may either be initialized by the user or is autogenerated"""

    def __init__(self, *args, **kwargs):
        """Initialize the ExportOnClick class."""
        super().__init__(*args, **kwargs)
        # Add the selected_files trait
        self.add_traits(files=traitlets.traitlets.List())
        # Create the button.
        self.description = "Click to Export"   # RE-NAMNG THIS TO ALLOW US TO APPLY IT FOR MORE 
        self.icon = ""                         # CASES THAN JUST CONCATENATING FILES EXCLUSIVELY
        self.style.button_color = "lightblue" 
        # Set on click behavior.
        self.on_click(self.ClickedOutput)

    @staticmethod
    def ClickedOutput(CoC):
        """Generate instance of tkinter.filedialog.
        Parameters
        ----------
        CoC (concatenate on click): obj:
            An instance of ipywidgets.widgets.Button
        """
        
        
        Conc.ConcatIfMatch(User.value, SessionDate.value, ProjectNotes.value, ProjectName.value, GEfile, OutputFold.fold[0], OutputFileFormat.value)
        CoC.files = GEfile.files

        if CoC.files[0]!= '':
            CoC.description = "Export Success"
            CoC.icon = "check"
            CoC.style.button_color = "lightgreen"


#--------
# initialize this to zero so that concat is only appended to WidgetList when true, not always
# note: we're doing this bc we want ConcatButton to append as the 4th element, not the first
AppendConcat = 0
emptyfiles = False

try:
    if len(GEfile.files) > 0:
        # Allow user to concatenate files
        ConcatButton = ExportOnClick()
        display(widgets.Label('Please click to merge and export input files to proceed.'))
        display(ConcatButton)
        AppendConcat = 1
        
        if ProjectName.value:
            OutputName = ProjectName.value
        elif not ProjectName.value:
            try:
                OutputName = Conc.InNameConcatReturn()  #returns InNameConcat
                progressbarCONCAT.close()

            except NameError:
                pass    # we're doing this to bypass the NameError that pops up when the program tries to run 
                        # InNameConcatReturn() prior to running ConcatIfMatch(). (ConcatIfMatch runs the for loop
                        # that assigns a value to the variable InNameConcat). This can potentially also be fixed via 
                        # seperating the naming for loop from the rest of the for loop but then we're going though
                        # the same for loop twice which is a smidge less efficient (though altogether doesn't make a
                        # difference at all in the grand scheme of things, but is considered good practise, I guess.)
        else:
            print("Some other output name generating error occurred: check function ReadMatchName()")
        
    #elif len(GEfile.files) == 1:
    #    print("Only 1 file uploaded: no concat required.")
    #    COMMENTING THIS OUT AND REWRITING ABOVE IF TO INCLUDE len == 1 TO 
    #    ENSURE EXPORTING RUNS FOR len == 1 AS WELL.
        
    elif len(GEfile.files) == 0:
        print("No files uploaded.")
        emptyfiles = True
        
    else: # shouldn't ever really run...
        print("Some error occurred.")
        
except ValueError:
    print("Some other error occurred.")

    
#----------- Toggle lock settings ------------
# Function to save User Inputs every time Lock Settings is Clicked
NumberLocked2 = 0

def ExecuteOnLock():
    if emptyfiles == True:
        display(Markdown('## <font color=red>Upload Error: Please ensure you\'ve uploaded all necessary files and folders.</font>'))
        return
    if (GEfile.files != []) & (GEfile.files[0] != ''):
        # Saving settings in dictionary:
        QualSettingsDict['FileInfo']['GEFileName'] = GEfile.files[0]
        QualSettingsDict['FileInfo']['DAPIimFileName'] = DAPIfile.files[0]
        QualSettingsDict['FileInfo']['OutDataFolder'] = OutputFold.fold[0]

        # reading data table and displaying
        global GEdf, BMsAvail
        GEdf = pd.read_table(QualSettingsDict['FileInfo']['GEFileName'])

        BMsAvail = GEFileFunctions.BiomarkerNames(GEdf)

        display(Markdown('## Biomarkers Measured:'))
        BMstring = ''
        for BM in BMsAvail:
            BMstring = BMstring + '__'+ BM + '__ | '
        display(Markdown(BMstring))

        display(Markdown('## GE File:'))
        display(GEdf)

        # Defining defaults

        # Slide-region pairings
        slide_reg = []
        for slide in list(set(GEdf['slide'])):
            regs = list(set(GEdf.loc[GEdf['slide']==slide]['region']))
            slide_reg.append([slide,regs])

        QualSettingsDict['ProcessingSettings']['slide_regs_original'] = slide_reg
        QualSettingsDict['ProcessingSettings']['slide_regs_selected'] = slide_reg



        # Epithelial or Stroma - converting to int
        if 'epithelial' in list(GEdf.columns):
            QualSettingsDict['ProcessingSettings']['epithelial'] = list(map(int,list(set(GEdf['epithelial']))))

        if 'stroma' in list(GEdf.columns):
            QualSettingsDict['ProcessingSettings']['stroma'] = list(map(int,list(set(GEdf['stroma']))))

        # qc_score - converting to int
        QualSettingsDict['CellAcceptRanges']['qc_score'] = min(GEdf['qc_score']), max(GEdf['qc_score'])
        # Perimeter
        QualSettingsDict['CellAcceptRanges']['Perimeter'] = min(GEdf['Perimeter']), max(GEdf['Perimeter'])
        # Eccentricity
        QualSettingsDict['CellAcceptRanges']['Eccentricity'] = min(GEdf['Eccentricity']), max(GEdf['Eccentricity'])
        # MajorAxisLength
        QualSettingsDict['CellAcceptRanges']['MajorAxisLength'] = min(GEdf['MajorAxisLength']), max(GEdf['MajorAxisLength'])
        # MinorAxisLength
        QualSettingsDict['CellAcceptRanges']['MinorAxisLength'] = min(GEdf['MinorAxisLength']), max(GEdf['MajorAxisLength'])
        # Perimeter
        QualSettingsDict['CellAcceptRanges']['Nuc_Area'] = min(GEdf['Nuc_Area']), max(GEdf['Nuc_Area'])
        # Perimeter
        QualSettingsDict['CellAcceptRanges']['Cyt_Area'] = min(GEdf['Cyt_Area']), max(GEdf['Cyt_Area'])
        # Perimeter
        QualSettingsDict['CellAcceptRanges']['Memb_Area'] = min(GEdf['Memb_Area']), max(GEdf['Memb_Area'])
        # Perimeter
        QualSettingsDict['CellAcceptRanges']['Cell_Area'] = min(GEdf['Cell_Area']), max(GEdf['Cell_Area'])
        # Run next cell if not run yet - stop it from re-running cell below
        global NumberLocked2
        if NumberLocked2 == 0:
            # Run next markdown cell
            display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index()+1, IPython.notebook.get_selected_index()+2)'))
        NumberLocked2 = NumberLocked2 + 1
        
    else:
        display(Markdown('## No GE file was properly selected. Please try to select a folder again.'))
    
def ExecuteOnUnlock():
    return None
# --- Calling Toggle ----
WidgetList = [GEfile, DAPIfile, OutputFold]
if AppendConcat == 1:
    WidgetList.append(ConcatButton) #this will hopefully append concat button as the 4th element in the list
    
WidgFunc.SimpleToggleLockSettings(WidgetList, ExecuteOnLock, ExecuteOnUnlock)

Biomarkers Measured:

CK8_18 | COX2 | ER | ER_2 | HER2 | KI67 | NAKATPASE | P16 | P21 | P53 | PR | PR_2 | S6 | S6_2 |

GE File:

CellID slide region epithelial qc_score Perimeter Eccentricity MajorAxisLength MinorAxisLength NominalPostion_X NominalPosition_Y Nuc_Area Cyt_Area Memb_Area Cell_Area CK8_18 COX2 ER ER_2 HER2 KI67 NAKATPASE P16 P21 P53 PR PR_2 S6 S6_2
0 1 20171227DCIS001 spot_015 1.0 1.000000 703.0 0.621742 183.931717 144.059448 2031.0 3421.0 157.0 3438.0 1674.0 7503.0 4514.450 1851.870 1365.560 197.732 11540.50 129.505 1701.570 1701.110 544.190 335.416 671.052 286.576 1417.850 165.096
1 2 20171227DCIS001 spot_015 1.0 1.000000 668.0 0.956405 193.105164 56.395039 1713.0 1087.0 9.0 3154.0 1825.0 6947.0 3562.590 1637.720 1164.160 161.815 10162.00 26.989 2731.770 1514.050 382.211 322.898 685.446 383.420 1160.930 166.333
2 3 20171227DCIS001 spot_015 1.0 1.000000 553.0 0.944681 170.295959 55.855595 179.0 3090.0 0.0 3045.0 1696.0 5547.0 5209.620 1893.660 1321.740 260.240 11734.60 40.630 2786.890 1932.070 709.147 496.298 950.543 351.968 1945.960 198.102
3 4 20171227DCIS001 spot_015 1.0 0.803877 450.0 0.608451 101.794334 80.783127 3022.0 2636.0 4.0 2522.0 403.0 5313.0 1898.180 904.041 878.317 192.729 4126.30 55.397 659.370 1180.520 296.159 231.584 690.247 167.577 1094.710 109.030
4 5 20171227DCIS001 spot_015 1.0 1.000000 501.0 0.932142 163.594254 59.236137 2567.0 2815.0 81.0 2459.0 1144.0 5053.0 3271.970 1960.930 1626.080 234.190 12507.70 57.145 2279.480 1375.390 702.368 327.072 816.349 538.921 1320.050 159.147
5 6 20171227DCIS001 spot_015 1.0 1.000000 428.0 0.938164 133.879990 46.348129 2017.0 1957.0 0.0 2543.0 1340.0 4542.0 4039.000 2110.200 1469.840 230.004 13136.30 60.402 3681.950 1417.810 590.533 416.974 881.545 466.864 1573.770 167.527
6 7 20171227DCIS001 spot_015 1.0 0.799383 518.0 0.906761 129.967026 54.799992 598.0 80.0 0.0 2365.0 551.0 4536.0 2878.670 582.349 473.216 102.608 1512.63 16.668 797.543 2717.540 386.933 525.532 919.245 361.100 2122.050 189.521
7 8 20171227DCIS001 spot_015 0.0 1.000000 423.0 0.726600 98.157280 67.439964 2988.0 917.0 0.0 1665.0 826.0 4493.0 785.890 752.891 873.769 47.276 7132.65 3.277 1397.180 681.014 105.812 42.546 443.300 358.941 574.675 93.464
8 9 20171227DCIS001 spot_015 1.0 1.000000 384.0 0.842511 104.526741 56.306400 461.0 2077.0 0.0 1897.0 1269.0 4323.0 4429.810 1503.370 862.296 170.373 8040.64 13.561 2920.130 2106.870 384.012 338.914 748.372 449.345 1456.920 131.214
9 10 20171227DCIS001 spot_015 1.0 1.000000 386.0 0.178758 77.790970 76.537994 396.0 3258.0 100.0 2483.0 1151.0 4161.0 5151.580 1872.960 1388.600 267.520 11359.80 37.855 2894.100 1750.300 772.772 519.154 1019.150 346.527 2010.620 225.333
10 11 20171227DCIS001 spot_015 1.0 0.804393 376.0 0.893704 115.308876 51.734180 353.0 144.0 65.0 1906.0 1146.0 3870.0 4193.570 1169.940 1287.550 134.019 10551.30 52.960 2134.310 1725.110 586.429 524.828 943.487 420.726 1990.540 220.498
11 12 20171227DCIS001 spot_015 0.0 1.000000 422.0 0.927083 127.560028 47.816719 2989.0 1028.0 5.0 1516.0 655.0 3826.0 872.496 898.369 1057.230 57.798 8289.77 8.553 1551.550 787.208 107.637 52.173 489.192 422.401 488.778 94.941
12 13 20171227DCIS001 spot_015 1.0 0.962194 345.0 0.739340 77.883713 52.441650 2716.0 3043.0 17.0 1640.0 860.0 3756.0 3479.710 1526.490 1473.700 203.649 12144.70 741.045 1862.770 1722.290 462.360 256.810 640.472 262.973 1228.550 161.843
13 14 20171227DCIS001 spot_015 1.0 1.000000 344.0 0.827253 91.004494 51.129066 2148.0 3110.0 0.0 1974.0 710.0 3611.0 4360.500 2560.670 1608.620 359.354 10935.10 46.702 1804.720 1845.080 821.112 563.668 1080.060 407.900 1891.140 189.921
14 15 20171227DCIS001 spot_015 1.0 1.000000 367.0 0.807432 92.506882 54.575424 2196.0 2871.0 0.0 1699.0 1138.0 3281.0 4587.230 2727.880 2388.210 517.598 13950.10 217.064 4554.360 2413.300 1089.240 883.278 1582.540 440.742 2468.710 149.485
15 16 20171227DCIS001 spot_015 1.0 1.000000 345.0 0.365553 71.629684 66.672241 1765.0 3393.0 0.0 1373.0 726.0 3261.0 3592.540 1477.650 1233.120 233.320 9137.28 61.104 2095.670 1835.060 580.738 488.660 1074.060 439.130 1507.990 121.835
16 17 20171227DCIS001 spot_015 1.0 1.000000 460.0 0.896190 97.364983 43.197929 3044.0 840.0 5.0 1656.0 638.0 3202.0 2546.400 1315.150 1669.670 261.509 8260.72 71.039 1635.730 1723.940 419.750 312.146 855.831 369.111 1422.570 119.261
17 18 20171227DCIS001 spot_015 1.0 0.898087 299.0 0.617247 74.311607 58.466099 542.0 1945.0 0.0 1541.0 861.0 3189.0 3559.040 1407.750 1133.150 238.721 8585.18 26.424 2931.720 2994.380 536.283 521.446 969.061 299.871 1880.130 170.088
18 19 20171227DCIS001 spot_015 1.0 1.000000 301.0 0.714059 76.594009 53.622364 1109.0 1496.0 0.0 1502.0 937.0 3013.0 3849.080 1721.570 1247.790 360.961 6254.17 59.929 2821.300 1854.370 741.193 856.354 1353.480 388.863 2262.940 191.273
19 20 20171227DCIS001 spot_015 0.0 1.000000 409.0 0.829706 112.622650 62.866043 3255.0 1657.0 0.0 1102.0 628.0 2983.0 934.416 1057.950 1153.630 90.777 8400.34 13.740 1690.950 523.594 153.965 92.351 518.184 522.742 541.496 98.189
20 21 20171227DCIS001 spot_015 1.0 1.000000 291.0 0.732059 81.186562 55.307590 2368.0 243.0 0.0 1288.0 959.0 2958.0 2325.710 1341.550 1516.560 105.050 15050.90 71.974 2721.850 1278.670 434.403 277.684 696.275 312.347 1418.320 119.764
21 22 20171227DCIS001 spot_015 1.0 1.000000 346.0 0.751508 83.353554 54.990322 2792.0 496.0 0.0 1232.0 711.0 2872.0 2566.820 1409.960 1464.280 193.481 9282.10 109.636 1918.310 1647.300 531.176 374.826 894.933 370.940 1414.990 84.634
22 23 20171227DCIS001 spot_015 1.0 0.136886 439.0 0.848579 94.908600 50.213123 582.0 1777.0 0.0 1361.0 317.0 2871.0 4226.440 1471.430 1323.500 481.135 4056.94 57.200 1995.000 5256.700 906.839 980.894 1681.220 358.563 2884.820 197.189
23 24 20171227DCIS001 spot_015 1.0 1.000000 362.0 0.910159 104.327148 43.218582 1851.0 2224.0 4.0 1882.0 811.0 2850.0 3936.480 2686.670 1537.390 328.888 10547.30 127.458 3208.270 1710.950 699.206 504.029 1090.220 861.545 1747.150 201.642
24 25 20171227DCIS001 spot_015 1.0 1.000000 337.0 0.935016 103.897186 36.842571 2496.0 3021.0 73.0 1543.0 806.0 2813.0 3470.320 2464.330 2092.110 297.673 15721.10 137.771 3590.740 1733.400 800.664 475.710 1105.060 921.518 1555.810 146.331
25 26 20171227DCIS001 spot_015 1.0 1.000000 358.0 0.732305 80.790863 55.016693 1141.0 3477.0 440.0 1422.0 775.0 2800.0 4761.280 2503.990 2309.260 521.636 14160.10 182.233 3205.610 2313.080 1056.780 952.099 1428.900 400.823 2194.330 126.732
26 27 20171227DCIS001 spot_015 1.0 1.000000 381.0 0.900655 99.931831 43.423832 3236.0 2008.0 105.0 1546.0 679.0 2694.0 1992.220 1879.790 1633.260 254.327 8527.39 34.199 1742.310 1055.920 432.409 307.053 890.609 616.987 1437.250 162.549
27 28 20171227DCIS001 spot_015 1.0 1.000000 312.0 0.903146 103.926155 44.618988 706.0 2316.0 174.0 1711.0 1004.0 2586.0 5745.770 3201.000 2245.540 513.066 13730.80 110.597 4223.760 2246.720 1019.020 917.364 1636.460 1254.240 2412.030 167.547
28 29 20171227DCIS001 spot_015 1.0 1.000000 300.0 0.883019 82.631439 38.782021 1062.0 3417.0 0.0 1496.0 876.0 2529.0 4608.380 2137.120 2002.910 298.737 15436.30 322.948 3955.130 1755.200 712.134 511.231 1019.360 408.917 2018.140 267.578
29 30 20171227DCIS001 spot_015 1.0 1.000000 307.0 0.883914 85.591270 40.026676 3228.0 2126.0 86.0 1333.0 956.0 2494.0 2528.070 2133.780 1731.950 257.741 10807.40 69.948 2247.940 1227.600 575.439 306.551 799.721 770.027 1123.860 143.842
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23148 23248 20171227DCIS001 spot_015 1.0 1.000000 30.0 0.645497 9.797959 7.483315 2144.0 2214.0 0.0 36.0 21.0 51.0 5698.690 2901.880 2942.140 862.000 11961.40 561.863 3532.690 4195.350 2065.760 1651.820 2494.290 136.235 3959.940 252.039
23149 23249 20171227DCIS001 spot_015 1.0 1.000000 35.0 0.865281 13.138568 6.586199 2162.0 1971.0 0.0 0.0 5.0 51.0 4232.470 1215.180 1856.250 494.392 3648.86 167.235 1804.490 912.098 1701.290 1623.750 3004.240 7.255 3246.250 14.902
23150 23250 20171227DCIS001 spot_015 1.0 1.000000 30.0 0.726116 9.749868 6.703736 2175.0 1894.0 0.0 41.0 27.0 51.0 3979.330 1864.730 1994.920 608.549 6760.12 77.529 3370.490 2905.980 1302.550 1333.590 2302.450 328.667 3411.920 153.882
23151 23251 20171227DCIS001 spot_015 1.0 1.000000 36.0 0.827786 12.592970 7.065205 2201.0 529.0 0.0 0.0 0.0 51.0 3486.880 860.922 1939.960 395.039 2509.20 27.765 1785.800 823.020 1673.780 1468.630 3088.550 0.000 3214.840 0.000
23152 23252 20171227DCIS001 spot_015 1.0 1.000000 31.0 0.703197 9.552378 6.791692 2254.0 311.0 22.0 15.0 25.0 51.0 7449.880 6503.860 5181.710 1527.310 24435.00 1358.220 6820.820 5488.290 3256.750 2754.920 3649.530 0.667 3846.840 2.451
23153 23253 20171227DCIS001 spot_015 1.0 1.000000 32.0 0.956705 13.578663 3.952203 2273.0 2584.0 51.0 19.0 23.0 51.0 3989.980 2740.530 6661.470 2177.020 10024.40 603.882 5651.080 4315.940 2482.630 2777.710 3331.020 225.294 2013.140 74.667
23154 23254 20171227DCIS001 spot_015 1.0 1.000000 26.0 0.985583 15.527339 2.627118 2302.0 2224.0 0.0 36.0 20.0 51.0 4374.450 3228.180 2819.410 739.725 12466.30 340.157 5133.940 7452.330 1814.860 1562.570 2754.290 410.235 3532.780 81.588
23155 23255 20171227DCIS001 spot_015 1.0 1.000000 29.0 0.500000 8.000000 6.928203 2309.0 3265.0 0.0 44.0 10.0 51.0 4328.040 4561.330 3276.690 1473.140 9357.80 259.608 4386.490 13456.300 2157.590 2174.670 3168.800 104.078 4613.880 75.569
23156 23256 20171227DCIS001 spot_015 1.0 1.000000 39.0 0.925808 15.652324 5.916482 2357.0 1942.0 0.0 30.0 6.0 51.0 12736.800 3817.100 3505.250 734.373 10168.10 233.176 2689.160 3269.860 4397.920 1060.430 1774.730 538.490 3056.630 152.510
23157 23257 20171227DCIS001 spot_015 1.0 1.000000 33.0 0.730968 11.158630 7.614786 2372.0 378.0 0.0 28.0 40.0 51.0 4766.670 4779.880 8533.920 902.882 57897.80 1235.880 13462.000 3921.570 2878.780 1904.780 2612.040 140.569 4359.900 136.078
23158 23258 20171227DCIS001 spot_015 1.0 1.000000 31.0 0.661693 9.936867 7.450414 2374.0 364.0 8.0 7.0 21.0 51.0 6696.940 7619.450 7272.330 1286.270 44108.60 901.765 7737.100 4451.220 3600.750 2220.410 3441.240 64.392 4533.760 34.314
23159 23259 20171227DCIS001 spot_015 1.0 1.000000 36.0 0.851125 12.458843 6.540431 2372.0 1367.0 0.0 0.0 0.0 51.0 4358.140 1916.200 2444.670 700.235 6192.80 134.686 2154.780 1538.750 1581.730 1345.100 2865.040 0.000 2929.630 4.098
23160 23260 20171227DCIS001 spot_015 1.0 1.000000 36.0 0.799090 13.053961 7.848192 2469.0 3080.0 33.0 2.0 16.0 51.0 6122.450 5929.180 5172.780 1570.390 20763.50 1274.610 5804.350 4363.160 2651.060 2169.100 3318.270 18.274 2815.470 5.745
23161 23261 20171227DCIS001 spot_015 1.0 1.000000 34.0 0.629360 9.540847 7.414327 2534.0 769.0 35.0 0.0 37.0 51.0 3660.220 3594.240 3550.240 1123.820 8129.59 667.529 5378.760 2819.920 2123.220 2225.040 3486.940 0.549 3222.390 10.882
23162 23263 20171227DCIS001 spot_015 1.0 1.000000 43.0 0.944443 17.722059 5.824825 2597.0 1906.0 0.0 16.0 16.0 51.0 3654.820 2970.220 3017.760 843.314 5997.47 229.118 3832.860 2855.240 1871.180 1719.530 3133.570 6.647 3784.510 13.726
23163 23264 20171227DCIS001 spot_015 1.0 1.000000 35.0 0.874965 12.980680 6.285057 2597.0 2672.0 0.0 26.0 1.0 51.0 4007.290 1609.240 2355.650 647.333 6153.29 225.216 1755.240 1739.780 1657.730 1577.670 2651.370 220.961 3125.350 95.000
23164 23265 20171227DCIS001 spot_015 1.0 1.000000 39.0 0.986696 16.278702 2.646484 2697.0 1221.0 29.0 20.0 22.0 51.0 10342.300 6488.290 6792.160 2268.650 32160.80 1424.100 7396.240 5037.570 2950.200 2997.140 3524.940 2.471 4593.510 87.020
23165 23266 20171227DCIS001 spot_015 1.0 1.000000 31.0 0.909249 13.121312 5.461793 2725.0 1410.0 0.0 40.0 18.0 51.0 6799.610 4102.670 4626.290 1083.630 18377.20 333.471 4859.650 3061.240 1690.650 1337.250 2312.690 828.098 2675.450 241.843
23166 23267 20171227DCIS001 spot_015 1.0 1.000000 39.0 0.833834 13.190279 7.281245 2792.0 1016.0 23.0 12.0 23.0 51.0 8500.140 8464.510 7429.530 2360.820 23690.80 1454.570 7483.200 5561.180 3196.550 2826.100 4084.960 97.941 3848.100 23.039
23167 23268 20171227DCIS001 spot_015 1.0 1.000000 31.0 0.774369 11.462785 7.252901 2807.0 1567.0 0.0 25.0 12.0 51.0 2418.270 1098.630 1896.060 468.392 6020.67 23.118 1350.390 2582.390 845.784 577.373 1613.840 475.000 2472.750 197.176
23168 23269 20171227DCIS001 spot_015 1.0 1.000000 49.0 0.919555 18.707203 7.351227 2826.0 1572.0 0.0 8.0 7.0 51.0 2804.270 1040.410 1963.530 594.118 3115.39 28.078 1538.240 1440.100 1427.140 879.235 2288.760 102.529 2892.650 58.137
23169 23270 20171227DCIS001 spot_015 1.0 1.000000 37.0 0.933034 14.667979 5.277347 2851.0 969.0 0.0 19.0 27.0 51.0 3082.860 2422.590 3848.000 940.078 18482.10 334.451 5916.760 3832.140 2026.610 1361.220 2540.220 91.275 3876.410 67.373
23170 23272 20171227DCIS001 spot_015 1.0 1.000000 29.0 0.914670 9.634987 3.894486 2858.0 1533.0 0.0 33.0 8.0 51.0 2795.220 1500.710 1818.670 608.157 3687.00 81.177 1713.940 2673.390 1032.020 970.686 1949.310 435.647 3011.760 213.431
23171 23273 20171227DCIS001 spot_015 1.0 1.000000 34.0 0.972624 11.688540 2.716251 2954.0 775.0 44.0 8.0 16.0 51.0 5143.840 3992.000 7711.860 2542.750 12360.90 486.176 3305.650 5647.590 2186.250 2415.820 2702.750 37.667 2421.570 13.137
23172 23274 20171227DCIS001 spot_015 1.0 1.000000 27.0 0.813055 9.307827 5.418889 2570.0 1224.0 0.0 27.0 27.0 49.0 12199.200 5967.390 5041.080 1158.590 29702.30 588.469 7746.310 3766.140 2072.310 1930.800 2636.920 210.673 4320.670 219.245
23173 23275 20171227DCIS001 spot_015 1.0 1.000000 18.0 0.999938 970.338135 10.764525 1972.0 2007.0 0.0 15.0 14.0 38.0 8972.390 4554.680 4769.580 1810.450 13473.70 387.658 6499.760 3698.870 2565.340 2610.320 3146.080 196.605 3514.370 116.947
23174 23276 20171227DCIS001 spot_015 1.0 1.000000 13.0 0.999995 2258.802246 7.245688 1949.0 1548.0 0.0 14.0 19.0 35.0 8916.140 5720.260 5608.740 1868.510 22174.90 634.600 7104.600 4571.490 2784.170 2872.800 3649.340 10.429 4459.090 80.943
23175 23277 20171227DCIS001 spot_015 1.0 1.000000 16.0 0.581270 7.276281 5.920788 1837.0 2877.0 0.0 32.0 9.0 32.0 13980.600 5665.560 3352.060 924.281 13476.60 107.188 4052.380 2610.690 2260.310 2113.940 2559.190 252.625 4052.000 315.031
23176 23278 20171227DCIS001 spot_015 1.0 1.000000 12.0 0.983365 8.731251 1.585957 65.0 1933.0 0.0 12.0 0.0 19.0 9868.950 4874.680 4514.890 2386.950 7756.68 343.421 3548.630 2564.530 4012.110 4234.470 3077.630 93.684 4304.790 102.632
23177 23279 20171227DCIS001 spot_015 1.0 1.000000 11.0 0.993957 3023.278564 331.858398 1879.0 2666.0 0.0 11.0 2.0 11.0 11685.500 4856.000 3720.180 1038.730 13593.80 176.000 3984.910 2376.820 1721.180 1434.820 2307.090 643.545 3442.090 448.000

23178 rows × 29 columns

In [10]:
display(Markdown('# 4) Select the Biomarkers to be Analysed'))

display(Markdown('The following cell will produce a quick check list containing all of the biomarkers mentioned by the GE file.'))

display(Markdown('Please ensure that ___only___ the biomarkers that you would like to include in the analysis are selected.'))

# Run next code Cell
WidgFunc.SimpleClickProceedSettings()

4) Select the Biomarkers to be Analysed

The following cell will produce a quick check list containing all of the biomarkers mentioned by the GE file.

Please ensure that only the biomarkers that you would like to include in the analysis are selected.

In [11]:
#=================== Select Biomarkers ============================
# Making Widget Containers of check boxes per biomarker
checkboxes1 = []
checkboxes2 = []
cb_container1 = widgets.VBox()
cb_container2 = widgets.VBox()

# Divide Biomarkers into 2 lists for cleaner display
count = 1
for BM in BMsAvail:
    if count%2==0:
        checkboxes2.append(widgets.Checkbox(description = BM, value=True))
    else:
        checkboxes1.append(widgets.Checkbox(description = BM, value=True))
    count = count+1

# for loops might not be necessary (leftover from older code)
cb_container1.children=[i for i in checkboxes1]
cb_container2.children=[i for i in checkboxes2]

# Allowing both vertical boxes to be displayed side by side
cb_container = widgets.HBox([cb_container1,cb_container2])

# Display instructions and widgets
display(Markdown('#### Please select the Biomarkers to include:'))
display(cb_container)

#=========== Select Slides and Regions ===================================
# Function to disable regions if the owning slide is unselected 
# (avoid possible human error)
def SlideCheckDisable(args):
    slide = args['owner'].description
    for i, sld in enumerate(QualSettingsDict['ProcessingSettings']['slide_regs_original']):
        if slide == sld[0]:
            index = i
    # extract relevant slide container
    RelSlideCont = slideCont.children[index]
    RegsCont = RelSlideCont.children[3]
    
    # Disable region checks if user deselects slide
    if args['new'] == False:
        for i in range(len(RegsCont.children)):
            RegsCont.children[i].disabled = True
    else:
        for i in range(len(RegsCont.children)):
            RegsCont.children[i].disabled = False

# Build widgets for selecting slide
# Containers
slideCont = widgets.VBox()
regionContLayout = Layout(
    border = 'solid',
    display = 'flex',    
)

# Widgets
# Regions:
SlideWidgs = [] # save for disable and reenabled
AllSlides = []
for i,slide in enumerate(QualSettingsDict['ProcessingSettings']['slide_regs_original']):
    slideW = []
    slideW.append(widgets.Label('Slide ' + slide[0]+ ':'))
    # making checkbox of region
    slidecheck = widgets.Checkbox(
        description = slide[0],
        value = True
    )
    # attach function
    slidecheck.observe(SlideCheckDisable,'value')
    
    slideW.append(slidecheck)
    SlideWidgs.append(slidecheck)
    regionCont = widgets.HBox()
    regionW = []
    for region in QualSettingsDict['ProcessingSettings']['slide_regs_original'][i][1]:
        regionW.append(widgets.Checkbox(
            description = region,
            value = True,
        ))
    
    # add region widgets
    SlideWidgs = SlideWidgs + regionW
    
    regionCont.children = regionW
    regionCont.layout = regionContLayout
    
    slideW.append(widgets.Label('Associated Regions: '))
    slideW.append(regionCont)
    slideW.append(widgets.Label('---------------------------------------------'))
    
    FullSlideCont = widgets.VBox()
    FullSlideCont.children = slideW
    
    AllSlides.append(FullSlideCont)
    
    
slideCont.children = AllSlides

# Display instructions and widgets
display(Markdown('#### Select which slides and their regions to include in the analysis:'))
display(slideCont)

# =============== Select based on cell located in epithelial/stroma mask ============================================
# when there is evidence in the GE file that either a stroma or epithelial mask 
# was included in the original analysis, 

# function to save settings when radio buttons are clicked
def UpdateEpiStromaCells(args):
    desc = args['owner'].description
    if desc == 'Epithelial:':
        if args['new'] == 'Include All Cells':
            QualSettingsDict['ProcessingSettings']['epithelial'] = [0, 1]
        elif args['new'] == 'Only Include Epithelial Cells':
            QualSettingsDict['ProcessingSettings']['epithelial'] = [1]
            
    if desc == 'Stroma:':
        if args['new'] == 'Include All Cells':
            QualSettingsDict['ProcessingSettings']['stroma'] = [0, 1]
        elif args['new'] == 'Only Include Stroma Cells':
            QualSettingsDict['ProcessingSettings']['stroma'] = [1]

# Building widgets
EpiStromaWidgs = () # list to save widgets built during this section so that they can be added to lock/unlock button
if 'epithelial' in list(QualSettingsDict['ProcessingSettings']):
    EpiStromaRad = widgets.RadioButtons(
        options = ['Include All Cells','Only Include Cells found in Epithelial Segmentation Mask'],
        value = 'Include All Cells',
        description = 'Epithelial:'
    )
    EpiStromaRad.observe(UpdateEpiStromaCells, 'value')
    display(widgets.Label('Epithelial Cell Setting:'))
    display(EpiStromaRad)
    
    # add to widget list
    EpiStromaWidgs = EpiStromaWidgs + tuple([EpiStromaRad]) + tuple(SlideWidgs)
    

if 'stroma' in list(QualSettingsDict['ProcessingSettings']):
    EpiStromaRad = widgets.RadioButtons(
        options = ['Include All Cells','Only Include Stroma Cells'],
        value = 'Include All Cells',
        description = 'Stroma:'
    )
    EpiStromaRad.observe(UpdateEpiStromaCells)
    display(widgets.Label('Stroma Cell Setting:'))
    display(EpiStromaRad)
    # add to widget list
    EpiStromaWidgs = EpiStromaWidgs + tuple([EpiStromaRad])



#-------- on toggle -------------
global NumberLocked3
NumberLocked3 = 0

def ExecuteOnLock():
# Gather selected biomarkers and save setting
    for i in range(len(cb_container.children)):
        for c in range(len(cb_container.children[i].children)):
            cb_container.children[i].children[c].disabled = True
            if cb_container.children[i].children[c].value == True:
                QualSettingsDict['ProcessingSettings']['BMs'][cb_container.children[i].children[c].description] = {
                    'LabelMethod':[],
                    'DataTransform':[],
                }
    # Gather which slides and regions to include in analysis
    QualSettingsDict['ProcessingSettings']['slide_regs_selected'] = []
    slide_regs = []
    for i in range(len(slideCont.children)):
        slideCont.children[i].children[1].disabled = True
        regsfinal = []
        for j in range(len(slideCont.children[i].children[3].children)):
            slideCont.children[i].children[3].children[j].disabled = True
            if slideCont.children[i].children[3].children[j].value == True:
                regsfinal.append(slideCont.children[i].children[3].children[j].description)

        if slideCont.children[i].children[1].value == True:
            slidefinal = slideCont.children[i].children[1].description
            QualSettingsDict['ProcessingSettings']['slide_regs_selected'].append([slidefinal, regsfinal])

    # Run next cell if not run yet - stop it from re-running cell below
    global NumberLocked3
    if NumberLocked3 ==0:
        # Run next markdown cell
        display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index()+1, IPython.notebook.get_selected_index()+2)'))
    NumberLocked3 = NumberLocked3 + 1
        
def ExecuteOnUnlock():
    # Unlock settings
    for i in range(len(slideCont.children)):
        # reenable region checks
        if slideCont.children[i].children[1].value == False:
            
            for j in range(len(slideCont.children[i].children[3].children)):
                slideCont.children[i].children[3].children[j].disabled = True

#--- Calling Toggle ----
WidgetList = cb_container.children[0].children[:] + cb_container.children[1].children[:] + EpiStromaWidgs
WidgFunc.SimpleToggleLockSettings(WidgetList, ExecuteOnLock, ExecuteOnUnlock)

Please select the Biomarkers to include:

Select which slides and their regions to include in the analysis:

In [12]:
display(Markdown('# 5) Cell Quality Control Parameters'))
display(Markdown('Choose the Quality Control parameters of the cell. The Cell Quality Control paremters are primarily comprised of cell morphology measurements including:'))
display(Markdown('* Cell Nuclear Area'))
display(Markdown('* Cell Membrane Area'))
display(Markdown('* Total Cell Area'))
display(Markdown('* Total Cell Perimeter'))
display(Markdown('* Total Cell Eccentricity'))
display(Markdown('* Total Cell Major Axis Length'))
display(Markdown('* Total Cell Minor Axis Length'))
display(Markdown('* Quality Control Score (provided by GE Layers*)'))

display(Markdown('All area, length and perimeter metrics are in Pixel units.'))


# Run next code Cell
WidgFunc.SimpleClickProceedSettings()

5) Cell Quality Control Parameters

Choose the Quality Control parameters of the cell. The Cell Quality Control paremters are primarily comprised of cell morphology measurements including:

  • Cell Nuclear Area
  • Cell Membrane Area
  • Total Cell Area
  • Total Cell Perimeter
  • Total Cell Eccentricity
  • Total Cell Major Axis Length
  • Total Cell Minor Axis Length
  • Quality Control Score (provided by GE Layers*)

All area, length and perimeter metrics are in Pixel units.

In [13]:
# filter selected slides and regions
# Isolating selected slides and regions:
slideDF = GEdf

NewDFlist = []
for slidereg in QualSettingsDict['ProcessingSettings']['slide_regs_selected']:
    intermDF = slideDF[(slideDF['slide']== slidereg[0])]
    for reg in slidereg[1]:
        NewDFlist.append(intermDF[(intermDF['region']== reg)])

NewDF1 = pd.concat(NewDFlist)

# Update NewDF to reflect epithelial/stroma settings:
if 'epithelial' in list(QualSettingsDict['ProcessingSettings']):
    if QualSettingsDict['ProcessingSettings']['epithelial'] != [0, 1]:
        NewDF1 = NewDF1[(NewDF1['epithelial']== QualSettingsDict['ProcessingSettings']['epithelial'][0])]
        
global NewDF
NewDF = NewDF1
        
QualSettingsDict['ProcessingSettings']['epithelial']
#============== Building plotly trace of acceptable vs. rejected cells proportion==============

def updatePie(accept):
    
    TotalPie = go.Pie(labels = ['Acceptable Cells','Rejected Cells'], 
                      values = [accept, (100 - accept)],
                      marker = dict(colors = ['#1f77b4','#ff7f0e']),
                    
                     )
    Fig = go.Figure(data = [TotalPie])
    py.iplot(Fig)
    display(Markdown('## Number of Cells Included: ' + str(round(accept/100 * len(GEdf)))+ ' [cells]'))
    

# initialize acceptance value
initAccept = (len(NewDF1)/len(GEdf))*100

# Dummy slide widget that allows for the pie graphs interactive inharritance
widgSlide = widgets.FloatSlider(
    description = 'Percentage Accepted', 
    min = 0,
    max = 100,
    readout = True,
    layout = Layout(width = '100%'),
    value = initAccept,
    disabled = True,
)


interactive_plot = interactive(updatePie, accept = widgSlide)

#=============== Building widgets to  ========================================================
def GEdf2PlotlyHistoRangeWslider(geCol,verbose):
    # building trace
    Trace = go.Histogram(
        x = GEdf[geCol],
        name = verbose,
    )
    HistoLayout = go.Layout(
        title = verbose + ' Values',
        xaxis = dict(
            range = [min(GEdf[geCol]),max(GEdf[geCol])]
        )
    )
    Fig = go.Figure(data = [Trace], layout = HistoLayout)
    py.iplot(Fig)
    
    # Building slider:
    Slide = widgets.FloatRangeSlider(
        description = verbose + ':',
        min = min(GEdf[geCol]),
        max = max(GEdf[geCol]),
        readout = True,
        layout = Layout(width = '100%'),
        value = QualSettingsDict['CellAcceptRanges'][geCol],
        style = {'description_width':'initial'}

    )
    # updating the cell range when a slide changes
    def UpdateCellAcceptRanges(args):
        QualSettingsDict['CellAcceptRanges'][geCol] = args['new']
        # Update The new dataframe
        global NewDF
        NewDF = NewDF1

        for rangeset in QualSettingsDict['CellAcceptRanges']:
            NewDF = NewDF[(NewDF[rangeset]>= 
                          QualSettingsDict['CellAcceptRanges'][rangeset][0])
                         & (NewDF[rangeset]<=
                         QualSettingsDict['CellAcceptRanges'][rangeset][1])]
        # Update the pie graph
        widgSlide.value = (len(NewDF)/len(GEdf))* 100
        
    Slide.observe(UpdateCellAcceptRanges,'value')
    
    display(Slide)
    
    return Slide    
    

#=============== Execute Visualizations and Widgets =======================
AllSlides = []
if list(set(NewDF['qc_score'])) == [0,1]:
    scoretitle = 'Quality Control Score [0, 1]'
else:
    scoretitle = 'Quality Control Score [dimensionless score]'
AllSlides.append(GEdf2PlotlyHistoRangeWslider('qc_score',scoretitle))
AllSlides.append(GEdf2PlotlyHistoRangeWslider('Nuc_Area','Nuclear Area [Pixels]'))
AllSlides.append(GEdf2PlotlyHistoRangeWslider('Memb_Area','Membrane Area [Pixels]'))
AllSlides.append(GEdf2PlotlyHistoRangeWslider('Cyt_Area','Cytoplasm Area [Pixels]'))
AllSlides.append(GEdf2PlotlyHistoRangeWslider('Cell_Area','Cell Area [Pixels]'))

AllSlides.append(GEdf2PlotlyHistoRangeWslider('Perimeter','Cell Perimeter [Pixels]'))
AllSlides.append(GEdf2PlotlyHistoRangeWslider('Eccentricity','Cell Eccentricity [dimensionless]'))
AllSlides.append(GEdf2PlotlyHistoRangeWslider('MajorAxisLength','Cell Major Axis Length  [Pixels]'))
AllSlides.append(GEdf2PlotlyHistoRangeWslider('MinorAxisLength','Cell Minor Axis Length  [Pixels]'))
display(interactive_plot) #displaying the interactive pie chart
widgSlide.close() #remove dummy widget!

#=============== Lock Slides Toggle ================================
#-------- on toggle -------------
global NumberLocked4
NumberLocked4 = 0

def ExecuteOnLock():
    global NumberLocked4
    if NumberLocked4 ==0:
        # Run next markdown cell
        display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index()+1, IPython.notebook.get_selected_index()+2)'))
    NumberLocked4 = NumberLocked4 + 1
        
def ExecuteOnUnlock():
    return None

#--- Calling Toggle ----
WidgetList = AllSlides
WidgFunc.SimpleToggleLockSettings(WidgetList, ExecuteOnLock, ExecuteOnUnlock)
In [14]:
display(Markdown('# 6) Cell Biomarker Expression Data Settings'))
display(Markdown('From the Biomarkers selected in Section 3, please select the __Labelling Method__, each method\'s associated settings,and the __Data Transformation Option__ procedure per biomarker.'))

display(Markdown('## Labelling Method:'))
display(Markdown('There are currently 3 labelling methods available:'))

display(Markdown('__1) Hard Thresholding by Biomarker__ - Where the biomarker is manually labelled where if the intensity value is less than the first threshold, the label will be __0__ and if it is greater than the first threshold, it will be labelled as __1__, etc.'))
display(Markdown('To hard code a threshold, simply place the numbber of thresholds in the _[biomarker] Number of Thresholds:_ integer box. That number of float boxes will appear underneath.'))

"""file = open("ScreenCaptureIms\HardThresholding.png", "rb")
hardimg = file.read()
hardimgwid = widgets.Image(value=hardimg,                                  ########## FIGURE OUT WHY THIS DOESNT WORK LATER
                            format='png')
display(hardimgwid)"""

display(Markdown('It is important to note that the first default threshold is the first quartile valur of the biomarker\'s intensity values.'))

display(Markdown('__2) K-Means by Biomarker__ - The labelling is done by the K-Means unsupervised learning algorithm, where only that biomarker is assessed for clustering at a time. Thereby, the K-Means is assessing where is the best clustering threshold. _(https://bigdata-madesimple.com/possibly-the-simplest-way-to-explain-k-means-algorithm/)_.'))

display(Markdown('__3) K-Means Grouped Biomarker__ - The labelling takes all the biomarkers selected to be part of the _K-Means Grouped Biomarker_ where similarily, the K-Means decides on the best clustering based on the intensity values of all of the biomarkers selected.'))

display(Markdown('## Data Transformations'))


# Run next code Cell
WidgFunc.SimpleClickProceedSettings()

6) Cell Biomarker Expression Data Settings

From the Biomarkers selected in Section 3, please select the Labelling Method, each method's associated settings,and the Data Transformation Option procedure per biomarker.

Labelling Method:

There are currently 3 labelling methods available:

1) Hard Thresholding by Biomarker - Where the biomarker is manually labelled where if the intensity value is less than the first threshold, the label will be 0 and if it is greater than the first threshold, it will be labelled as 1, etc.

To hard code a threshold, simply place the numbber of thresholds in the [biomarker] Number of Thresholds: integer box. That number of float boxes will appear underneath.

It is important to note that the first default threshold is the first quartile valur of the biomarker's intensity values.

2) K-Means by Biomarker - The labelling is done by the K-Means unsupervised learning algorithm, where only that biomarker is assessed for clustering at a time. Thereby, the K-Means is assessing where is the best clustering threshold. (https://bigdata-madesimple.com/possibly-the-simplest-way-to-explain-k-means-algorithm/).

3) K-Means Grouped Biomarker - The labelling takes all the biomarkers selected to be part of the K-Means Grouped Biomarker where similarily, the K-Means decides on the best clustering based on the intensity values of all of the biomarkers selected.

Data Transformations

In [15]:
# =========== Colour Choosing function ======================
def num2ColList(num):
    orignum = num
    if num in [3,4,5,6,7,8,9]:
        ColList = cl.to_rgb(cl.scales[str(num)]['qual']['Set1'])
    elif num <3:
        cols = cl.to_rgb(cl.scales[str(3)]['qual']['Set1'])
        ColList = cols[:num]
    elif (num >9) & (num < 16):
        cols = cl.to_rgb( cl.interp( cl.scales['9']['qual']['Set1'], 16 ) ) 
        ColList = cols[:num]
    else:
        # some random "num" values (eg. num = 93) causes random index error
        # to accommodate loop add one to the number and try again. Then only
        # return list[0:num]
        test = False
        ColList = []
        while test == False:
            try:
                ColList = cl.to_rgb( cl.interp( cl.scales['11']['qual']['Paired'], num ))

            except:
                #print('failed')
                num = num + 1
            if ColList != []:
                break # break infinite loop
        
    ColList = ColList[0:orignum]
        
    return ColList


#============= Updating Threshold Boxes ==========================
def UpdateThresholdContainers(args):
    description = args['owner'].description
    
    # identify which group is this
    for i, bm in enumerate(BMsfinal):
        if description == bm +' Number of Thresholds:':
            index = i
    
    children = list(AllContainer.children[index].children)
    Hbox = AllContainer.children[index].children[3]
    Oldchildren = list(Hbox.children)
    # get list of what's there
    OldVect = []
    oldthreshnum = 0
    for i, child in enumerate(Oldchildren):
        threshnum =  int(re.findall(r'\d+', child.description)[0])
        value = child.value

        if (threshnum - (i+1)) != 0:
            for j in range(0,threshnum - (i+1)):
                OldVect.append(((oldthreshnum+j+1),1))
            OldVect.append((threshnum,value))
        else:
            OldVect.append((threshnum,value))
            oldthreshnum = threshnum
    # add more to meet the new value
    if len(OldVect)<args['new']:
        oldlen = len(OldVect)
        for i in range(0,(args['new']-oldlen)):
            OldVect.append(((oldlen + i +1),1))
    # remove older threshnum
    elif len(OldVect) > args['new']:
        n = len(OldVect) - args['new']
        OldVect = OldVect[:len(OldVect)-n]

    # Make new thresholds
    NewChildren = []    
    for i, threshes in enumerate(OldVect):

        threshW = widgets.BoundedFloatText(
            value = threshes[1],
            description = 'Threshold '+ str(threshes[0]) +':',
            style = style,
            disabled = False,
            step = 0.001,
            min = 0,
            max = 1000000
        )

        NewChildren.append(threshW)
    # update
    children2 = NewChildren
                
    Hbox.children = children2
    children[3] = Hbox    
    #BMcontainer.children = children
    AllContainer.children[index].children = children


#================ Label Methods Radio Buttons ===============================
def SelectLabelMethod(args):
    description = args['owner'].description
    
    # identify which group is this
    for i, bm in enumerate(BMsfinal):
        if description == bm:
            index = i
    
    # re-enable all threshold widgets
    if args['new'] == 'Hard Thresholding by Biomarker':
        AllContainer.children[index].children[2].children[0].disabled = False
        for i in range(len(AllContainer.children[index].children[3].children)):
            AllContainer.children[index].children[3].children[i].disabled = False
        # disabling k-means number of clusters
        AllContainer.children[index].children[4].disabled = True
        # disabling Data Transformation
        AllContainer.children[index].children[5].value = 'None'
        AllContainer.children[index].children[5].disabled = True
    
    elif args['new'] == 'K-Means by Biomarker':
        # Disableing all threhsold clustering options
        AllContainer.children[index].children[2].children[0].disabled = True
        for i in range(len(AllContainer.children[index].children[3].children)):
            AllContainer.children[index].children[3].children[i].disabled = True
        # enabling k-means number of clusters
        AllContainer.children[index].children[4].disabled = False
        # enabling Data Transformation
        AllContainer.children[index].children[5].disabled = False
        
        
    else:
        AllContainer.children[index].children[2].children[0].disabled = True
        for i in range(len(AllContainer.children[index].children[3].children)):
            AllContainer.children[index].children[3].children[i].disabled = True
        # disabling k-means number of clusters
        AllContainer.children[index].children[4].disabled = True
        # enabling Data Transformation
        AllContainer.children[index].children[5].disabled = False

#============================================================================================================================
#=====================================================================================================================

#------------ Build Traces --------------------
traces = []
BMsfinal = list(QualSettingsDict['ProcessingSettings']['BMs'].keys())
colors = num2ColList(len(BMsfinal))
for i, bm in enumerate(BMsfinal):
    trace = go.Histogram(
        x = NewDF[bm],
        opacity = 0.75,
        name = bm,
        marker = dict(color = colors[i])
    )
    traces.append(trace)

    layout = go.Layout(
        barmode = 'overlay', 
        title = 'All Biomarkers Selected',
        legend=dict(orientation="h")
    )
fig = go.Figure(data = traces, layout=layout)

display(Markdown('### Double click on the stain in the legend to isolate the graph.'))

py.iplot(fig)

#**************** Build and Display All Widgets *******************
# Initialize for all (styling, initial widget format, etc.)--------------------------------------

style = {'description_width':'initial'}
BMsettings = []
AllContainerList = []

# Widgets to lock
WidgetList = []

#--- Building individual widget sets
for bm in BMsfinal:
    
    # widget to ask how many thresholds
    NumThreshes = widgets.BoundedIntText(
        continuous_update = True,
        value = 1,
        description = bm+' Number of Thresholds:',
        style = style,
        disabled = False,
        step = 1,
        min = 1,
        model_id = bm,
    )
    NumThreshes.observe(UpdateThresholdContainers,'value')
    
    # add widget to list
    WidgetList.append(NumThreshes)
    
    # Widget to Select Labelling Method
    LabelMethod = widgets.RadioButtons(
        options = ['Hard Thresholding by Biomarker','K-Means by Biomarker', 'K-Means Grouped Biomarker'],
        description = bm,
        )
    LabelMethod.observe(SelectLabelMethod,'value')
    # add widget to list
    WidgetList.append(LabelMethod)
    
    # Innitiating:
    contLabel = widgets.Label(value = (bm + ' Labelling Method Settings:'))

    NumThreshSubContainer = widgets.HBox()
    NumThreshSubContainer.children = [NumThreshes]
    NumThreshSubContainer.layout.flex

    # build initial threshold widget
    ThreshSubContainer = widgets.HBox()
    ThreshSubContainer.children = [widgets.BoundedFloatText(
        description = 'Threshold '+ '1' +':',
        style = style,
        disabled = False,
        step = 0.001,
        min = 0,
        value = NewDF[bm].quantile(0.25),
        max = 1000000000,
    )]
    # widget to ask how many thresholds
    NumKmeans = widgets.BoundedIntText(
        continuous_update = True,
        value = 2,
        description = bm+' K-Means Number of Clusters:',
        style = style,
        disabled = True,
        step = 1,
        min = 2,
        #model_id = bm,
    )
    WidgetList.append(NumKmeans)
    
    # Widget to Select Data Transformations
    DataTrans = widgets.RadioButtons(
        options = ['None','Normalize', 'Log then Normalize'],
        description = bm + ' Data Transformations',
        style = style,
        value = 'None',
        disabled = True,
    
    )
    
    BMcontainer = widgets.VBox([contLabel,
                                LabelMethod,
                                NumThreshSubContainer, 
                                ThreshSubContainer,
                                NumKmeans,
                                DataTrans,
                               widgets.Label('___________________________________________________________________________'),])
    BMcontainer.layout.display = 'flex'
    
    # add all containers to AllContainer
    AllContainerList.append(BMcontainer)
    BMsettings.append((bm,[]))

AllContainer = widgets.VBox(AllContainerList)

display(AllContainer)

#%%%%%%% Lock Widgets and Move Workflow Forward %%%%%%%%%%%%%%%%%%%
#-------- on toggle -------------
global NumberLocked5
NumberLocked5 = 0

def ExecuteOnLock():
    for index in range((len(AllContainer.children))): # the last widget is the toggle
        AllContainer.children[index].children[2].children[0].disabled = True
        AllContainer.children[index].children[1].disabled = True

        # disabling k-means number of clusters
        AllContainer.children[index].children[4].disabled = True
        # disabling Data Transformation number of clusters
        AllContainer.children[index].children[5].disabled = True

        bm = AllContainer.children[index].children[1].description

        ThreshVect = []
        for i in range(len(AllContainer.children[index].children[3].children)):
            AllContainer.children[index].children[3].children[i].disabled = True
            ThreshVect.append(AllContainer.children[index].children[3].children[i].value)

        # ** Collect data for QualSettingsDict
        # Label Method:
        if (AllContainer.children[index].children[1].value) == 'Hard Thresholding by Biomarker':
            QualSettingsDict['ProcessingSettings']['BMs'][bm]['LabelMethod'] = [(AllContainer.children[index].children[1].value), list(set(ThreshVect))]
        elif (AllContainer.children[index].children[1].value) == 'K-Means by Biomarker':
            QualSettingsDict['ProcessingSettings']['BMs'][bm]['LabelMethod'] = [(AllContainer.children[index].children[1].value), [AllContainer.children[index].children[4].value]]
        else:
            QualSettingsDict['ProcessingSettings']['BMs'][bm]['LabelMethod'] = [(AllContainer.children[index].children[1].value)]
        # Data Transformation:
        QualSettingsDict['ProcessingSettings']['BMs'][bm]['DataTransform'] = AllContainer.children[index].children[5].value

    # Running the next 2 cells once
    global NumberLocked5
    if NumberLocked5 ==0:
        # Run next markdown cell
        display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index()+1, IPython.notebook.get_selected_index()+2)'))
    NumberLocked5 = NumberLocked5 + 1
        
def ExecuteOnUnlock():
    # Unlocking subwidgets per biomarker
    for index in range(0,len(AllContainer.children)): # the last widget is the toggle
        # Renable Labelling Method Radio Button
        AllContainer.children[index].children[1].disabled = False
        # Renable Data Transformation Radio Button
        AllContainer.children[index].children[5].disabled = False
        # unlock thresholding widgets
        if AllContainer.children[index].children[1].value == 'Hard Thresholding by Biomarker':

            for i in range(len(AllContainer.children[index].children[3].children)):
                AllContainer.children[index].children[2].children[0].disabled = False
                AllContainer.children[index].children[3].children[i].disabled = False
        # unlock k-means widgets
        elif AllContainer.children[index].children[1].value == 'K-Means by Biomarker':
                AllContainer.children[index].children[4].disabled = False

#--- Calling Toggle ----
WidgetList = AllSlides
WidgFunc.SimpleToggleLockSettings([], ExecuteOnLock, ExecuteOnUnlock)

Double click on the stain in the legend to isolate the graph.

In [16]:
display(Markdown('# 7)	Data Processing and Visualization'))


# Run next code Cell
WidgFunc.SimpleClickProceedSettings()

7) Data Processing and Visualization

In [25]:
##%%%%%%%%%%%%% Function to Transform Data %%%%%%%%%%
def Raw2TransformData(rawdata, TransformMethod):
    # no transformation required by user - return the same values
    if TransformMethod == 'None':
        newdata = rawdata

    # Normalize raw data
    elif TransformMethod == 'Normalize':
        newdata = rawdata/max(rawdata)

    # Log the data then normalize
    elif TransformMethod == 'Log then Normalize':
        newdata = np.log2(rawdata)
        newdata = newdata/max(newdata)
    
    return newdata

# Unit test        
"""bm = 'COX2'
rawdata = NewDF[bm].values
TransformMethod = QualSettingsDict['ProcessingSettings']['BMs'][bm]['DataTransform']

test = Raw2TransformData(rawdata,TransformMethod)
test"""

#%%%%%%%%%%%%%%% Label Data based on Settings %%%%%%%%%%%%%%%%
# loop through each biomarker and individually label cells based on thresholds
# Initializing new labeled dataframe
LabelDF = []
LabelDF = NewDF.loc[:,('CellID','slide','region','epithelial',
                        'NominalPostion_X','NominalPosition_Y')]
# string for future Cell Type order - only relevant for the non-continous settings (threshold and k-means)
LabelCellTypeString = ''
LabelCellCols = []
ContLabelCellTypeString = ''
ContLabelCellCols = []


for bm in QualSettingsDict['ProcessingSettings']['BMs']:
    # ****&&&& Transform Data
    rawdata = NewDF[bm].values
    TransformMethod = QualSettingsDict['ProcessingSettings']['BMs'][bm]['DataTransform']

    newdata = Raw2TransformData(rawdata, TransformMethod)
        
    # ****Thresholding Labelling
    if QualSettingsDict['ProcessingSettings']['BMs'][bm]['LabelMethod'][0] == 'Hard Thresholding by Biomarker':
        LabelCellTypeString = LabelCellTypeString + bm + '_'
        LabelCellCols.append(bm+'_ThreshLabel')
                
        # Add Column of zeros in LabelDF
        LabelDF[(bm+'_ThreshLabel')] = 0
        # sort the inputs and remove duplicate threshold inputs
        vect = sorted(list(set(QualSettingsDict['ProcessingSettings']['BMs'][bm]['LabelMethod'][1])))
        #print(vect)
        for i, thresh in enumerate(vect):
            # if it is the first bin
            if i == 0:
                LabelDF[(bm+'_ThreshLabel')].iloc[[(NewDF[bm] < thresh)]] = i + 1
                prevthresh = thresh
            # middle bins:
            elif i > 0 & i < (len(vect)-1):
                LabelDF[(bm+'_ThreshLabel')].iloc[[((NewDF[bm] >= prevthresh) & 
                       (NewDF[bm] < thresh))]] = i + 1
                prevthresh = thresh
            # last bin
            elif i == (len(vect)-1):
                LabelDF[(bm+'_ThreshLabel')].iloc[[(NewDF[bm] >= thresh)]] = i + 1
        
    # ****K-Means by Biomarker
    elif QualSettingsDict['ProcessingSettings']['BMs'][bm]['LabelMethod'][0] == 'K-Means by Biomarker':
        LabelCellTypeString = LabelCellTypeString + bm + '_'
        LabelCellCols.append(bm+'_KmeansLabel')
        
        K = QualSettingsDict['ProcessingSettings']['BMs'][bm]['LabelMethod'][1][0]
        #print(K)
        # Label Column is continuous to start:
        k_model_n2 = KMeans(n_clusters = K, random_state=1).fit(newdata.reshape(-1,1))
        LabelDF[(bm+'_KmeansLabel')] = k_model_n2.labels_
    
    # ****K-Means Grouped Biomarker
    elif QualSettingsDict['ProcessingSettings']['BMs'][bm]['LabelMethod'][0] == 'K-Means Grouped Biomarker':
        LabelDF[(bm+'_ContVal')] = newdata
        ContLabelCellCols.append(bm+'_ContVal')
        ContLabelCellTypeString = ContLabelCellTypeString + bm + '_'

# removing last character of string (extra underscore)
LabelCellTypeString = LabelCellTypeString[:-1]        
ContLabelCellTypeString = ContLabelCellTypeString[:-1]

# ============= Organize Cell Type Dataframe with only Thresholding Method and K-means values ===============
if LabelCellTypeString != '':
    # Identify cell types
    CellTypeDF = LabelDF.loc[ :, (LabelCellCols)]
    CellTypesArray = np.unique(CellTypeDF.values, axis=0)

    # labelling each row as the cell type
    CTdata = np.zeros(len(CellTypeDF.iloc[:,0])).astype(int)
    for i in range(len(CellTypesArray)):
         CTdata[(CellTypeDF.values == CellTypesArray[i]).all(axis=1)] = i + 1

    CellTypeDF['CellType_KThesh'] = CTdata
    NewDF['CellType_KThesh'] = CTdata

    #%%%%%%%%%%%%%%%%%%%%% Display Cell Type Legend %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    LegendDF = pd.DataFrame(data = CellTypesArray, columns = list(CellTypeDF.columns)[:-1])

    maxim = len(CellTypesArray)
    cmap_pie = plt.cm.get_cmap('nipy_spectral', maxim)

    ct_colors = []

    for n in range(0,maxim):
        ct_colors.append(cmap_pie(n))
        ct_colors[n] = "rgba" + str((ct_colors[n][0]*255, ct_colors[n][1]*255, ct_colors[n][2]*255, ct_colors[n][3]))    

    def highlightCell(x):

        y = pd.DataFrame('', index=x.index, columns = x.columns)
        for i in x.index:
            color = 'background-color: ' + ct_colors[i]
            y.iloc[i,-1] = color
        return y
    LegendDF['Color'] = '   '
    ColoredLegendDF = LegendDF.style.apply(highlightCell,axis=None)
    display(Markdown('## The following are the Cell Types found through the biomarker labelling analysis:'))
    display(ColoredLegendDF)

#%%%%%%%%%%%%%%%%%%%  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

# ====== Threshold Cell Naming ======
if LabelCellTypeString != '':

    #88888888 Pie Chart 8888888888888888
    # count how many of each type exists
    CTbreakdown = CellTypeDF['CellType_KThesh'].value_counts().sort_index().to_frame()
    labels = [('Cell Type ' + str(cname)) for cname in CTbreakdown.index.tolist()]
    values = CTbreakdown['CellType_KThesh']


    trace = go.Pie(labels = labels, values = values, marker = dict(colors = ct_colors), textinfo= 'none')

    display(Markdown('### Cell Type Breakdown of Threshold and K-Means Method:'))

    py.iplot([trace])
   

    ######################################################################################################
        ##############################################################################################
    ######################################################################################################


    #88888888888888   CORRECT PLOTTING TECHNIQUE VIA MATPLOT LIB    888888888888888
    
    #LABEL FIGURE
    display(Markdown('### Plotting:'))
    
    # Read image
    im_in = cv2.imread(DAPIfile.files[0], cv2.IMREAD_GRAYSCALE);

    # INITIALIZE FILEPATH AND NAME OF THE INPUT IMAGE TO SET UP OUTPUT FILE NAME
    image_path = DAPIfile.files[0]
    image_name = os.path.basename(image_path)
    outfile = "Visualization___" + os.path.splitext(image_name)[0] + ".png" #Naming convention for output?
    
    # redefining celltype DF
    coor_celltypeDF = pd.concat([LabelDF[["NominalPostion_X","NominalPosition_Y"]], CellTypeDF['CellType_KThesh']], axis=1)
    coor_celltypeDF_sorted = coor_celltypeDF.sort_values(by=["CellType_KThesh"])

    #PLOT IMAGE
    fig1 = plt.figure(figsize=(15,15))
    implot = plt.imshow(im_in, 'gray', alpha=0.65)
    ax1 = fig1.add_subplot(111)
    
    #SET UP COLORMAP
    number_of_plots=max(CellTypeDF['CellType_KThesh'])
    colormap = plt.cm.nipy_spectral #I suggest to use nipy_spectral, Set1, Paired
    ax1.set_prop_cycle('color',[colormap(i) for i in np.linspace(0, 1,number_of_plots)])
    
    #PLOT POINTS
    prevelem = 0
    for n in range(1, number_of_plots):
        nextelem = (coor_celltypeDF_sorted["CellType_KThesh"] == n).sum()+prevelem
        X = coor_celltypeDF_sorted["NominalPostion_X"][prevelem:nextelem]
        Y = coor_celltypeDF_sorted["NominalPosition_Y"][prevelem:nextelem]
        ax1.plot(Y,X, label = n, linewidth = 0.0, marker = 'o', markersize='5')
        prevelem=nextelem
    ax1.autoscale()

    #OUTPUT IMAGE TO DESIRED DIRECTORY
    fig1.savefig(outfile, bbox_inches='tight')
    
    #SHOW PLOT
    fig1.show()
    
    


    ######################################################################################################
        ##############################################################################################
    ######################################################################################################


    """# ====== Continuous Cell Naming Functions ======    
    # function to automatically assess optimal K in K-means
    def pickKmeansClusters(indata,minClus, highestNumCl):
        progressbar = widgets.FloatProgress(min = minClus, max=highestNumCl, 
                                            description = 'Assessing K:')
        display(progressbar)
        listOfMetric=[]
        metricDF = pd.DataFrame()
        index = []

        for clus in (range(minClus, (highestNumCl + 1))):
            kmeans_model = KMeans(n_clusters=clus, random_state=1).fit(indata)
            labels = kmeans_model.labels_
            listOfMetric.append(metrics.calinski_harabaz_score(indata, labels))
            index.append(clus)
            progressbar.value = clus

        indexMin=np.argmax(listOfMetric)

        numOfClus=minClus+indexMin

        metricDF['NumofClusts'] = index 
        metricDF['CalinskiHarabazScore'] = listOfMetric

        progressbar.close()
        return (numOfClus, metricDF)

    # ====== Continuous Cell Naming ======
    if ContLabelCellTypeString != '':
        display('______________________________________________________________')
        display(Markdown('# K-Means by Grouped Biomarkers with:'))
        # ----------- minClus and highestNumCl Widgets ------------------
        minClusWidg = widgets.BoundedIntText(
            description = 'Minimum number of clusters:',
            min  = 2,
            style = {'description_width':'initial'}
        )
        display(minClusWidg)
        maxClusWidg = widgets.BoundedIntText(
            description = 'Maximum number of clusters:',
            min  = 3,
            style = {'description_width':'initial'}
        )
        display(maxClusWidg)

        # Update button
        UpdateRegsToggle = widgets.Button(
            description = 'Launch Assessment',
            button_style = 'warning',
            disabled = False,
            value = True

        )


        # ---------- Identify Number of Clusters ------------
        def AssessKMeans(args):
            # Get range of K - even if user has the max larger than the min
            Krange = sorted([minClusWidg.value, maxClusWidg.value])
            display(Markdown('_______________________________________________'))
            display(Markdown(('#### Assessing possible K between K = ' + str(Krange[0]) + ' to K = ')+ str(Krange[1]) + ':'))

            ContCellTypeDF = LabelDF.loc[ :, ContLabelCellCols]



            NumofClus, metricDF = pickKmeansClusters(ContCellTypeDF.values, Krange[0], Krange[1])

            Trace = go.Scatter(
                x = metricDF['NumofClusts'],
                y = metricDF['CalinskiHarabazScore'],
                mode = 'lines+markers',
            )
            fig = dict(data = [Trace], layout=dict(
                title = 'Calinski Harabaz Score vs. Number of K-Means Clusters',
                xaxis = dict(title = 'Number of K-Means Clusters'),
                yaxis = dict(title = 'Calinski Harabaz Score')
            ))
            py.iplot(fig)

            display(Markdown('### Therefore the optimal K = ' + str(NumofClus)))


            # --------- Label each cell with chosen K-Means ---------------------
            Kmodel = KMeans(n_clusters = NumofClus, random_state = 1).fit(ContCellTypeDF.values)

            ContCellTypeDF['CellType_Cont'] = Kmodel.labels_
            NewDF['CellType_Cont'] = Kmodel.labels_

        display(UpdateRegsToggle)

        UpdateRegsToggle.on_click(AssessKMeans)





    #%%%%%%%%%%%%%%%%%%% EXPORTING DICT TO .XML %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

    from dicttoxml import dicttoxml
    DictToXML_output = dicttoxml(QualSettingsDict, attr_type=False)



    # ====== DICT TO .TXT ======
    #WriteDictToTxt = open("dict.txt","w")
    #WriteDictToTxt.write( str(QualSettingsDict) )
    #WriteDictToTxt.close()

    # ====== .TXT TO .XML ======
    """
    
# Run next markdown cell
display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index(), IPython.notebook.get_selected_index()+1)'))
C:\Users\User\Anaconda3\lib\site-packages\pandas\core\internals.py:940: FutureWarning:

Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.

The following are the Cell Types found through the biomarker labelling analysis:

ER_ThreshLabel PR_ThreshLabel KI67_KmeansLabel Color
0 0 0 0
1 0 0 1
2 0 1 0
3 0 1 1
4 1 0 0
5 1 0 1
6 1 1 0
7 1 1 1

Cell Type Breakdown of Threshold and K-Means Method:

Plotting:

C:\Users\User\Anaconda3\lib\site-packages\matplotlib\cbook\deprecation.py:107: MatplotlibDeprecationWarning:

Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.

C:\Users\User\Anaconda3\lib\site-packages\matplotlib\figure.py:457: UserWarning:

matplotlib is currently using a non-GUI backend, so cannot show the figure

C:\Users\User\Anaconda3\lib\site-packages\plotly\matplotlylib\renderer.py:402: UserWarning:

Aw. Snap! You're gonna have to hold off on the selfies for now. Plotly can't import images from matplotlib yet!

In [27]:
CellCluster_outputDF = pd.concat([LabelDF[["slide","region","NominalPostion_X","NominalPosition_Y"]], CellTypeDF['CellType_KThesh']], axis=1)
CellCluster_outputDF.to_csv('DataPoints_Clustering.csv')

display(Markdown('### Selected data points exported to output folder.'))
display(Markdown('## End of Program.'))

Selected data points exported to output folder.

End of Program.

In [ ]: